A New Symbolic Dissimilarity Measure for Multivalued Data Type and Novel Dissimilarity Approximation Techniques

نویسنده

  • Bapu B Kiranagi
چکیده

In this paper a new statistical measure for estimating the degree of dissimilarity between two symbolic objects whose features are multivalued symbolic data type is proposed. In addition two new simple representation techniques viz., interval type and magnitude type for the computed dissimilarity between the symbolic objects are introduced. The dissimilarity matrices obtained are not necessarily symmetric. Hence, clustering algorithms to work on such unconventional approximated matrices, by introducing the concept of mutual average dissimilarity value and magnitude average dissimilarity respectively for interval type and magnitude type approximation representations are also proposed.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Clustering Symbolic Time-Series using L-tuples

Among the many dimensionality reduction methods for timeseries data, Symbolic Aggregate approXimation (SAX) is perhaps the most popular due to its simplicity and uniqueness. With SAX, time-series data can be represented as string sequences which enables the utilization of methods found in text mining and bioinformatics to enhance data mining tasks. We propose an application of L-tuples to impro...

متن کامل

خوشه‌بندی داده‌های بیان‌ژنی توسط عدم تشابه جنگل تصادفی

Background: The clustering of gene expression data plays an important role in the diagnosis and treatment of cancer. These kinds of data are typically involve in a large number of variables (genes), in comparison with number of samples (patients). Many clustering methods have been built based on the dissimilarity among observations that are calculated by a distance function. As increa...

متن کامل

Dissimilarity measures for histogram-valued data and divisive clustering of symbolic objects

Contemporary datasets are becoming increasingly larger and more complex, while techniques to analyse them are becoming more and more inadequate. Thus, new methods are needed to handle these new types of data. This study introduces methods to cluster histogram-valued data. However, histogram-valued data are difficult to handle computationally because observations typically have a different numbe...

متن کامل

A Novel Distance Measure for Interval Data

Interval data is attracting attention from the data analysis community due to its ability to describe complex concepts. Since clustering is an important data analysis tool, extending these techniques to interval data is important. Applying traditional clustering methods on interval data loses information inherited in this particular data type. This paper proposes a novel dissimilarity measure w...

متن کامل

A dissimilarity measure for the k-Modes clustering algorithm

Clustering is one of the most important data mining techniques that partitions data according to some similarity criterion. The problems of clustering categorical data have attracted much attention from the data mining research community recently. As the extension of the k-Means algorithm, the k-Modes algorithm has been widely applied to categorical data clustering by replacing means with modes...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010